79 research outputs found

    Three Essays on Big Data Consumer Analytics in E-Commerce

    Get PDF
    Consumers are increasingly spending more time and money online. Business to consumer e-commerce is growing on average of 20 percent each year and has reached 1.5 trillion dollars globally in 2014. Given the scale and growth of consumer online purchase and usage data, firms\u27 ability to understand and utilize this data is becoming an essential competitive strategy. But, large-scale data analytics in e-commerce is still at its nascent stage and there is much to be learned in all aspects of e-commerce. Successful analytics on big data often require a combination of both data mining and econometrics: data mining to reduce or structure (from unstructured data such as text, photo, and video) large-scale data and econometric analyses to truly understand and assign causality to interesting patterns. In my dissertation, I study how firms can better utilize big data analytics and specific applications of machine learning techniques for improved e-commerce using theory-driven econometrical and experimental studies. I show that e-commerce managers can now formulate data-driven strategies for many aspect of business including cross-selling via recommenders on sales sites to increasing brand awareness and leads via social media content-engineered-marketing. These results are readily actionable with far-reaching economical consequences

    Mathematical modeling of translation initiation for the estimation of its efficiency to computationally design mRNA sequences with desired expression levels in prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Within the emerging field of synthetic biology, engineering paradigms have recently been used to design biological systems with novel functionalities. One of the essential challenges hampering the construction of such systems is the need to precisely optimize protein expression levels for robust operation. However, it is difficult to design mRNA sequences for expression at targeted protein levels, since even a few nucleotide modifications around the start codon may alter translational efficiency and dramatically (up to 250-fold) change protein expression. Previous studies have used <it>ad hoc </it>approaches (e.g., random mutagenesis) to obtain the desired translational efficiencies for mRNA sequences. Hence, the development of a mathematical methodology capable of estimating translational efficiency would greatly facilitate the future design of mRNA sequences aimed at yielding desired protein expression levels.</p> <p>Results</p> <p>We herein propose a mathematical model that focuses on translation initiation, which is the rate-limiting step in translation. The model uses mRNA-folding dynamics and ribosome-binding dynamics to estimate translational efficiencies solely from mRNA sequence information. We confirmed the feasibility of our model using previously reported expression data on the MS2 coat protein. For further confirmation, we used our model to design 22 <it>luxR </it>mRNA sequences predicted to have diverse translation efficiencies ranging from 10<sup>-5 </sup>to 1. The expression levels of these sequences were measured in <it>Escherichia coli </it>and found to be highly correlated (<it>R</it><sup><it>2 </it></sup>= 0.87) with their estimated translational efficiencies. Moreover, we used our computational method to successfully transform a low-expressing DsRed2 mRNA sequence into a high-expressing mRNA sequence by maximizing its translational efficiency through the modification of only eight nucleotides upstream of the start codon.</p> <p>Conclusions</p> <p>We herein describe a mathematical model that uses mRNA sequence information to estimate translational efficiency. This model could be used to design best-fit mRNA sequences having a desired protein expression level, thereby facilitating protein over-production in biotechnology or the protein expression-level optimization necessary for the construction of robust networks in synthetic biology.</p

    How Do Recommender Systems Affect Sales Diversity? A Cross-Category Investigation via Randomized Field Experiment

    Get PDF
    We investigate the impact of collaborative filtering recommender algorithms (e.g., Amazon\u27s “Customers who bought this item also bought”) commonly used in e-commerce on sales diversity. We use data from a randomized field experiment run on a top retailer in North America across 82,290 SKUs and 1,138,238 users. We report four main findings. First, we demonstrate across a wide range of product categories that the use of traditional collaborative filters (or CFs) is associated with a decrease in sales diversity relative to a world without product recommendations. Further, the design of the CF matters. CFs based on purchase data are associated with a greater effect size than those based on product views. Second, the decrease in aggregate sales diversity may not always be accompanied by a corresponding decrease in individual-level consumption diversity. In fact, it is even possible for individual consumption diversity to increase while aggregate sales diversity decreases. Third, co-purchase network analysis shows that recommenders can help individuals explore new products but similar users end up exploring the same kinds of products resulting in the concentration bias at the aggregate level. Fourth and finally, there is a difference between absolute and relative impact on niche items. Specifically, absolute sales and views for niche items in fact increase, but their gains are smaller compared to the gains in views and sales for popular items. Thus, while niche items gain in absolute terms, they lose out in terms of market shares

    Advertising Content and Consumer Engagement on Social Media: Evidence from Facebook

    Get PDF
    We describe the effect of social media advertising content on customer engagement using data from Facebook. We content-code 106,316 Facebook messages across 782 companies, using a combination of Amazon Mechanical Turk and natural language processing algorithms. We use this data set to study the association of various kinds of social media marketing content with user engagement—defined as Likes, comments, shares, and click-throughs—with the messages. We find that inclusion of widely used content related to brand personality—like humor and emotion—is associated with higher levels of consumer engagement (Likes, comments, shares) with a message. We find that directly informative content—like mentions of price and deals—is associated with lower levels of engagement when included in messages in isolation, but higher engagement levels when provided in combination with brand personality–related attributes. Also, certain directly informative content, such as deals and promotions, drive consumers’ path to conversion (click-throughs). These results persist after incorporating corrections for the nonrandom targeting of Facebook’s EdgeRank (News Feed) algorithm and so reflect more closely user reaction to content than Facebook’s behavioral targeting. Our results suggest that there are benefits to content engineering that combines informative characteristics that help in obtaining immediate leads (via improved click-throughs) with brand personality–related content that helps in maintaining future reach and branding on the social media site (via improved engagement). These results inform content design strategies. Separately, the methodology we apply to content-code text is useful for future studies utilizing unstructured data such as advertising content or product reviews

    Will the Global Village Fracture Into Tribes? Recommender Systems and Their Effects on Consumer Fragmentation

    Get PDF
    Personalization is becoming ubiquitous on the World Wide Web. Such systems use statistical techniques to infer a customer\u27s preferences and recommend content best suited to him (e.g., “Customers who liked this also liked…”). A debate has emerged as to whether personalization has drawbacks. By making the Web hyperspecific to our interests, does it fragment Internet users, reducing shared experiences and narrowing media consumption? We study whether personalization is in fact fragmenting the online population. Surprisingly, it does not appear to do so in our study. Personalization appears to be a tool that helps users widen their interests, which in turn creates commonality with others. This increase in commonality occurs for two reasons, which we term volume and product-mix effects. The volume effect is that consumers simply consume more after personalized recommendations, increasing the chance of having more items in common. The product-mix effect is that, conditional on volume, consumers buy a more similar mix of products after recommendations

    How Much Is An Image Worth? An Empirical Analysis of Property’s Image Aesthetic Quality on Demand at AirBNB

    Get PDF
    Consumers using sharing economy platforms such as Airbnb are challenged with high product uncertainty and search cost. To ameliorate these issues, Airbnb has implemented many strategies such as professionally taking high quality photos for hosts and calling them verified. In this paper we study the impact of having unit list\u27s photos verified. To assess the aesthetic quality of images, we use machine learning techniques. Employing Difference-in-Difference analysis, we find that on average, rooms with verified photos are 9% more frequently booked. We further separate the effect of photo verification from photo quality and room reviews and find an extra $2,455 in yearly earnings brought by high photo quality. Lastly, we look at the properties in the same neighborhood and find asymmetric spillover effects. On the neighborhood level, the results suggest higher overall demand if more rooms have verified photos

    Methicillin-Resistant Staphylococcus aureus Blood Isolates Harboring a Novel Pseudo-staphylococcal Cassette Chromosome mec Element

    Get PDF
    The aim of this work was to assess a novel pseudo-staphylococcal cassette chromosome mec (ΨSCCmec) element in methicillin-resistant Staphylococcus aureus (MRSA) blood isolates. Community-associated MRSA E16SA093 and healthcare-associated MRSA F17SA003 isolates were recovered from the blood specimens of patients with S. aureus bacteremia in 2016 and in 2017, respectively. Antimicrobial susceptibility was determined via the disk diffusion method, and SCCmec typing was conducted by multiplex polymerase chain reaction. Whole genome sequencing was carried out by single molecule real-time long-read sequencing. Both isolates belonged to sequence type 72 and agr-type I, and they were negative for Panton-Valentine leukocidin and toxic shock syndrome toxin. The spa-types of E16SA093 and F17SA003 were t324 and t2460, respectively. They had a SCCmec IV-like element devoid of the cassette chromosome recombinase (ccr) gene complex, designated as ΨSCCmecE16SA093. The element was manufactured from SCCmec type IV and the deletion of the ccr gene complex and a 7.0- and 31.9-kb portion of each chromosome. The deficiency of the ccr gene complex in the SCCmec unit is likely resulting in mobility loss, which would be an adaptive evolutionary mechanism. The dissemination of this clone should be monitored closely

    PLPD: reliable protein localization prediction from imbalanced and overlapped datasets

    Get PDF
    Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003)
    • …
    corecore